Distinguishing Proteins From Arbitrary Amino Acid Sequences
نویسندگان
چکیده
What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe.
منابع مشابه
Designing Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method
Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...
متن کاملAn Evolutionary Relationship Between Stearoyl-CoA Desaturase (SCD) Protein Sequences Involved in Fatty Acid Metabolism
Background: Stearoyl-CoA desaturase (SCD) is a key enzyme that converts saturated fatty acids (SFAs) to monounsaturated fatty acids (MUFAs) in fat biosynthesis. Despite being crucial for interpreting SCDs’ roles across species, the evolutionary relationship of SCD proteins across species has yet to be elucidated. This study aims to present this evolutionary relationship based on amino aci...
متن کاملDetection of Mutations of Antimutator Gene pfpI in Pseudomonas aeruginosa Species Isolated from Burn Patients in Tehran, Iran
Introduction: Pseudomonas aeruginosa is an opportunistic pathogen of clinical importance, particularly in immunocompromised and burn patients. This bacterium is becoming resistant to many antibiotics via intrinsic or acquired mechanisms. Mutations in anti-mutator genes, such as pfpI, can be a potential intrinsic mechanism of antibiotic resistance. This study aimed to evaluate the possible effec...
متن کاملNucleotide and Amino Acid Changes in HN, F and SH genes of an Iranian Mumps Virus; RS-12, Following Attenuation to Vaccine Strain
Background and Aims: Wild-type RS-12 strain of mumps virus has been isolated from an Iranian patient and has been attenuated after several serial passages. This study was designed to determine nucleotide and amino acid substitutions in the HN, F and SH genes during attenuation of the wild-type virus. Materials and Methods: Required viral samples prepared at Razi Vaccine and Serum Institute. Vi...
متن کاملGENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION
This paper considers the generation of some interpretable fuzzy rules for assigning an amino acid sequence into the appropriate protein superfamily. Since the main objective of this classifier is the interpretability of rules, we have used the distribution of amino acids in the sequences of proteins as features. These features are the occurrence probabilities of six exchange groups in the seque...
متن کامل